Application of Principal Components Analysis and Gaussian Mixture Models to Printer Identification

نویسندگان

  • Gazi N. Ali
  • Pei-Ju Chiang
  • Aravind K. Mikkilineni
  • George T. Chiu
  • Edward J. Delp
  • Jan P. Allebach
چکیده

Printer identification based on a printed document has many desirable forensic applications. In the electrophotographic process (EP) quasiperiodic banding artifacts can be used as an effective intrinsic signature. However, in text only document analysis, the absence of large midtone areas makes it difficult to capture suitable signals for banding detection. Frequency domain analysis based on the projection signals of individual characters does not provide enough resolution for proper printer identification. Advanced pattern recognition techniques and knowledge about the print mechanism can help us to device an appropriate method to detect these signatures. We can get reliable intrinsic signatures from multiple projections to build a classifier to identify the printer. Projections from individual characters can be viewed as a high dimensional data set. In order to create a highly effective pattern recognition tool, this high dimensional projection data has to be represented in a low dimensional space. The dimension reduction can be performed by some well known pattern recognition techniques. Then a classifier can be built based on the reduced dimension data set. A popular choice is the Gaussian Mixture Model where each printer can be represented by a Gaussian distribution. The distributions of all the printers help us to determine the mixing coefficient for the projection from an unknown printer. Finally, the decision making algorithm can vote for the correct printer. In this paper we will describe different classification algorithms to identify an unknown printer. We will present the experiments based on several different EP printers in our printer bank. The classification results based on different classifiers will be compared∗. ∗This research is supported by a grant from National Science Foundation, under award number 0219893. Introduction In our previous work, we have described the intrinsic and extrinsic features that can be used for printer identification. Our intrinsic feature extraction method is based on frequency domain analysis of the one dimensional projected signal. If there are a sufficient number of samples in the projected signal, the Fourier transform gives us the correct banding frequency. When we work with a text-only document, our objective is to get the banding frequency from the projected signals of individual letters. In this situation, there are not enough samples per projection to give high frequency domain resolution. Significant overlap between spectra from different printers makes it difficult to use it as an effective classification method. The printer identification or classification task is closely related to various pattern identification and pattern recognition techniques. The intrinsic features are the patterns that are used to recognize an unknown printer. The basic idea is to create a classifier that can utilize the intrinsic signatures from a document to make proper identification. A Gaussian mixture model(GMM) or the tree based classifier is suitable for the classification part; but the initial dimension reduction is performed by principal component analysis. Principal component analysis (PCA) is often used as a dimension-reducing technique within some other type of analysis. Classical PCA is a linear transform that maps the data into a lower dimensional space by preserving as much data variance as possible. In the case of intrinsic feature extraction, PCA can be used to reduce the dimension of the projected signal. The proper number of components can be chosen to discriminate between different printers. These components are the features that can be used by the classifier (GMM or tree classifier). The Gaussian mixture model defines the overall data set as a combination of several different Gaussian distributions. The parameters of the model are determined by the

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

IMAGE SEGMENTATION USING GAUSSIAN MIXTURE MODEL

  Stochastic models such as mixture models, graphical models, Markov random fields and hidden Markov models have key role in probabilistic data analysis. In this paper, we have learned Gaussian mixture model to the pixels of an image. The parameters of the model have estimated by EM-algorithm.   In addition pixel labeling corresponded to each pixel of true image is made by Bayes rule. In fact, ...

متن کامل

­­Image Segmentation using Gaussian Mixture Model

Abstract: Stochastic models such as mixture models, graphical models, Markov random fields and hidden Markov models have key role in probabilistic data analysis. In this paper, we used Gaussian mixture model to the pixels of an image. The parameters of the model were estimated by EM-algorithm.   In addition pixel labeling corresponded to each pixel of true image was made by Bayes rule. In fact,...

متن کامل

Patterns Prediction of Chemotherapy Sensitivity in Cancer Cell lines Using FTIR Spectrum, Neural Network and Principal Components Analysis

    Drug resistance enables cancer cells to break away from cytotoxic effect of anticancer drugs. Identification of resistant phenotype is very important because it can lead to effective treatment plan. There is an interest in developing classifying models of resistance phenotype based on the multivariate data. We have investigated a vibrational spectroscopic approach in order to characterize a...

متن کامل

Patterns Prediction of Chemotherapy Sensitivity in Cancer Cell lines Using FTIR Spectrum, Neural Network and Principal Components Analysis

    Drug resistance enables cancer cells to break away from cytotoxic effect of anticancer drugs. Identification of resistant phenotype is very important because it can lead to effective treatment plan. There is an interest in developing classifying models of resistance phenotype based on the multivariate data. We have investigated a vibrational spectroscopic approach in order to characterize a...

متن کامل

Negative Selection Based Data Classification with Flexible Boundaries

One of the most important artificial immune algorithms is negative selection algorithm, which is an anomaly detection and pattern recognition technique; however, recent research has shown the successful application of this algorithm in data classification. Most of the negative selection methods consider deterministic boundaries to distinguish between self and non-self-spaces. In this paper, two...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004